Photography-based 3d modeling system and method, and automatic 3d modeling apparatus and method

ABSTRACT

The present disclosure discloses a photography-based 3D modeling system and method, and an automatic 3D modeling apparatus and method, including: (S1) attaching a mobile device and a camera to the same camera stand; (S2) obtaining multiple images used for positioning from the camera or the mobile device during movement of the stand, and obtaining a position and a direction of each photo capture point, to build a tracking map that uses a global coordinate system; (S3) generating 3D models on the mobile device or a remote server based on an image used for 3D modeling at each photo capture point; and (S4) placing the individual 3D models of all photo capture points in the global three-dimensional coordinate system based on the position and the direction obtained in S2, and connecting the individual 3D models of multiple photo capture points to generate an overall 3D model that includes multiple photo capture points.

BACKGROUND Technical Field

The present disclosure relates to a 3D modeling system and method, andin particular to a photography-based 3D modeling system and method, andan automatic 3D modeling apparatus and method.

Description of the Related Art

To solve a technical problem, the present disclosure provides aphotography-based three-dimensional space modeling solution, which canbe used for single-space or multi-space 3D modeling and/or 2D floorplangeneration.

There are mainly two conventional photography-based 3D modeling methods,both of which have obvious disadvantages.

In method (a), a camera that can record depth information is used todirectly generate a 3D model. Such a method relies on complex hardware,resulting in high equipment costs, and usually operated by professionalphotographers. As a result, this method has disadvantages for wideadoption.

In method (b), two photos are captured respectively at two photo capturepoints that are close to each other. Preferably, the photo capturepoints are separated at the centimeter level or decimeter level, andfeature point matching is performed and photo capture points arepositioned successively. Then, Multi View Stereo (MVS) (for details,refer to https://github.com/cdcseacave/openMVS) is used for modeling.The advantage is that the entire process is fully automatic withoutmanual intervention. However, the disadvantages are obvious.

Disadvantage 1: It is computation intensive, and as a result rapidmodeling cannot be easily achieved on devices with limited computingresources, such as mobile devices. Photos usually need to be uploaded tothe server (cloud/PC), to run modeling algorithms benefiting fromstronger computing capacities.

Disadvantage 2: It is difficult to specify how far photo capture pointsshould be apart from each other. If the photo capture points are toodense, operations become inconvenient and time-consuming. If photocapture points are selected simply based on unobstructed line of sightbetween two adjacent photo capture points or by “feeling right”,modeling may fail, and no warning can be provided for users during photocapture.

In addition, methods for reconstructing three-dimensional space scenesbased on photography have been provided in the past. However, in most ofthese methods, 3D models cannot be automatically generated from theimages used for 3D modeling, and tedious manual intervention is requiredto correct the 3D model of each space. In addition, the 3D models ofmultiple spaces cannot be automatically assembled, therefore need to bemanually edited by finding matching features through human observation,which is time-consuming and labor-intensive.

BRIEF SUMMARY

To overcome one or more of the above disadvantages of the conventionalmethods, the present disclosure uses innovative methods, namely, deeplearning and image processing methods, to perform modeling for a singlephoto capture point. The modeling can be performed on a mobile devicewith a limited computing capability, or related data can be uploaded toa cloud server for modeling. In addition, in the case of rapid modelingby using the mobile device, to improve timeliness, only a room outlineis modeled, and models of objects such as furniture and decorations arenot restored. A photo capture point positioning system is built to placeindividual models of multiple photo capture points in the globalcoordinate system according to their positions and directions.Individual models of multiple photo capture points are optimized andproperly connected, to generate an overall 3D model and an overall 2Dfloorplan.

The present disclosure supports a wide range of photo capture methodswith low costs, including but not limited to a fisheye lens of a mobilephone, a panoramic camera, a camera with a fisheye lens, an ordinarymobile phone, an ordinary digital camera, etc.

Ordinary photo, a photo captured by using ordinary digital cameras(including an ordinary single-lens reflex (SLR) camera, a mirrorlesscamera, a point&shoot camera, etc.), a panoramic camera, a camera with afisheye lens, an ordinary mobile phone, a mobile phone with a fisheyelens, and a video camera. Unlike binocular vision, for ordinary photos,three-dimensional information cannot be restored from two photoscaptured at the same photo capture point. Ordinary photos arehereinafter referred to as photos.

When using a panoramic camera, panoramic images are usually captured.Some computer vision and image processing algorithms, such as linedetection, requires converting a panoramic image into an undistortedimage. The expressions of photos and pictures used below includepanoramic photos and converted undistorted images.

The present disclosure provides a photography-based 3D modeling systemand method, and an automatic 3D modeling apparatus and method, tosupport multiple photo capture devices, and automatically assemble 3Dmodels of various photo capture points based on an relative position ofeach photo capture point and capture direction information of a cameralens that are obtained during photo capture, to generate an overall 3Dmodel. In the present disclosure, a 2D floorplan can also be generated.

Specifically, the present disclosure provides a photography-based 3Dmodeling system, including: a photo capture unit, configured to capturea first image of each of multiple spaces; a 3D model generation unit,configured to generate a 3D model of each space based on the first imagethat is captured by the photo capture unit for each space; a captureposition acquisition unit, configured to obtain position and capturedirection information of the photo capture unit when capturing the firstimage of each space; and a 3D model assembling unit, configured to:based on the position and capture direction information, assemble the 3Dmodels of the individual spaces in the global three-dimensionalcoordinate system to generate an overall 3D model that includes theindividual spaces.

Further, the photo capture unit captures multiple second images whenmoving among the spaces; the capture position acquisition unit performsfeature point matching based on the multiple second images to obtainrelative displacement and/or capture direction information of each photocapture point, for example, build a tracking map that includes all photocapture points in the global coordinate system, so as to obtain positionand/or capture direction information of the photo capture unit whencapturing the first image of the space in which the photo capture unitis located.

Further, the photo capture unit has one or more positioning-awaresensors and/or one or more direction-aware sensors; and the captureposition acquisition unit obtains, based on positioning informationand/or direction information provided by the photo capture unit whencapturing a first image of a space in which the photo capture unit islocated, position and/or capture direction information of the photocapture unit when capturing the first image of the space in which thephoto capture unit is located.

Further, the photo capture unit captures multiple second images whenmoving among the spaces; the photo capture unit has one or morepositioning-aware sensors and/or one or more direction-aware sensors;and the capture position acquisition unit performs feature pointmatching based on images at adjacent photo capture points among themultiple second images captured by the photo capture unit, to obtainrelative displacement and capture direction information of each photocapture point, for example, by building a tracking map that includes allphoto capture points in the global coordinate system, and correcting thetracking map based on positioning information and/or directioninformation provided by the photo capture unit when capturing a firstimage of a space in which the photo capture unit is located, so as toobtain position and/or capture direction information of the photocapture unit when capturing the first image of the space in which thephoto capture unit is located.

Further, the capture position acquisition unit corrects the relativedisplacement (from which the tracking map is generated) and/or capturedirection information based on displacement information such asacceleration information and velocity information provided by one ormore displacement-aware sensors (which may include, for example, anacceleration sensor and a velocity sensor) of the photo capture unit.

Further, the 3D model assembling unit converts local coordinates of the3D model of each individual room into global coordinates, for example,by using a transformation matrix based on the position and capturedirection information obtained by the capture position acquisition unitwhen each room is captured, so as to obtain the overall 3D model of allphoto capture points.

Further, the method for converting local coordinates of the 3D model ofa single room into global coordinates includes: enabling the photocapture unit to move a predetermined distance, and obtaining, by thecapture position acquisition unit, coordinates of two endpoints of thepredetermined distance, where a ratio of the difference between thecoordinates of the two endpoints to the predetermined distance is thescale of the local coordinates to the global coordinates; or estimating,by using one or more feature points identified by the capture positionacquisition unit, a ratio of the height of the plane of the floor orceiling of the space to the actual height of the photo capture unit, toobtain the scale of the local coordinates to the global coordinates.

Further, before performing photo capture at a first photo capture pointor during movement of subsequent photo capture, the photo capture unitmoves a predetermined distance to obtain a predetermined quantity of thefeature points.

Further, the photo capture unit has binocular lenses, and the binocularlenses separately capture the first image at the same photo capturepoint; and the 3D model generation unit compares the first images thatare captured by the binocular lenses, determines corresponding pixels,and obtains depth information of each corresponding pixel, so as togenerate the 3D model.

Further, the 3D model generation unit predicts a depth of each pixel inthe first image by using a deep learning method, and calculates a normaldirection of each pixel or predicts the normal direction of each pixelby directly using the deep learning method, so as to generate a 3D modelof each space.

Further, the photo capture unit is implemented by a camera and/or amobile device such as a mobile phone with a photo capture function; the3D model generation unit is implemented by the mobile phone or by aremote server; when being implemented by the remote server, the 3D modelgeneration unit receives, through a network, one or more first imagesthat are captured and sent by the camera and/or the mobile phone with aphoto capture function, to generate a 3D model of each space; thecapture position acquisition unit is implemented by the camera or themobile phone; and the 3D model assembling unit is implemented by themobile phone or by a remote server; when being implemented by the remoteserver, the 3D model assembling unit receives, through a network, theposition and capture direction information of each space sent by thecapture position acquisition unit, completes the assembling processingbased on the position and capture direction information, and sends thegenerated overall 3D model to the mobile phone or another device.

Further, the camera and the mobile phone with a photo capture functionfor implementing the photo capture unit are attached to the same camerastand; and during movement of the stand, multiple second images capturedby the camera or the mobile phone with a photo capture function areobtained, so as to obtain position and capture direction information ofthe camera or the mobile phone with a photo capture function whencapturing the first image of the space in which the camera or the mobilephone is located.

Further, based on a positioning system of the camera or the mobile phonewith a photo capture function, the second images captured by the cameraor the mobile phone with a photo capture function are used, and featurepoint matching is performed based on second images at adjacent photocapture points to obtain relative displacement and capture directioninformation of each photo capture point, thereby providing a relativeposition and direction of each photo capture point.

Further, before capturing the first image of the first space or duringmovement of subsequent photo capture, the photo capture unit obtains anangle between the capture direction of a lens of the camera and thecapture direction of the mobile phone by using one or more of thefollowing methods:

herein, the capture direction of the lens of the camera may be adirection of one of two fisheye lenses (front and rear) of a commonpanoramic camera, or may be a direction of a lens for capturing thefirst photo by a panoramic camera that captures multiple photos neededfor one complete panoramic image by rotating one lens;

(1) simultaneously running a positioning system based on the mobilephone and a positioning system based on the camera, and moving the standby a specific distance; in such case, the two systems each provide onedisplacement vector, and an angle between the two vectors is the anglebetween the capture direction of the lens of the camera and the capturedirection of the mobile phone;

(2) specifying an angle consistent with the capture direction of themobile phone by manually rotating a preview image or a captured image ofthe camera;

(3) matching preview images or captured images of the mobile phone andthe camera by using an image recognition algorithm, to identify theangle;

(4) using an additional mark (including adding a mark to the stand whichis at a known fixed angle with a mounting direction of the mobilephone), and then identifying the mark in the preview image or the imageof the camera, so as to calculate the angle between the capturedirection of the lens of the camera and the capture direction of themobile phone; and

(5) using a camera installation interface on the stand so that a knownfixed angle is formed between the camera and the mobile phone (mobiledevice).

Further, the space is a room; the first image is an indoor image of theroom; and the 3D model generation unit identifies one or more imageareas of at least one of a floor, a ceiling, and a wall in the firstimage based on a deep learning method; divides the identified imagearea(s) into blocks based on an image processing technology, where eachblock is approximately considered as one plane, image blocks of thefloor and the ceiling are located on a horizontal plane, and an imageblock of the wall is located on a vertical plane; and generates the 3Dmodel by solving an equation for each plane, where for two planes thatintersect in the first image, an error between a calculated intersectingline and an actually observed intersecting line is minimized.

Further, the 3D model generation unit uses a computer vision algorithmto identify wall corners in the indoor image and connect the wallcorners to generate a rough model of the room.

Further, the 3D model assembling unit performs a correction on the 3Dmodels of the multiple rooms, including correcting wall line directionsof all rooms by using a statistical method, so that wall lines of allrooms are aligned in the same direction if they were parallel within anerror range; and when assembling the 3D models of the rooms, the 3Dmodel assembling unit corrects one or more overlapping parts and/orgaps.

Further, the photography-based 3D modeling system according to thepresent disclosure further includes a 2D floorplan generation unit,configured to generate a 2D floorplan in the following ways: projectingeach surface of the generated 3D model onto a plane parallel to thefloor, and merging these projections into a polygon; correcting andsimplifying the obtained polygon, including at least one of thefollowing methods: (1) retaining only main vertices of the polygon anddeleting small concave or convex rectangles; and (2) using a computervision algorithm to detect straight lines in the picture, and thendetermining the direction of a wall, and aligning edges that areapproximately parallel or perpendicular to the direction of the wall tocorresponding directions; assembling the generated 2D floorplans of therooms in the same two-dimensional coordinate system based on theposition and capture direction information of each space obtained by thecapture position acquisition unit, to generate an overall 2D floorplanfrom the individual 2D floorplans of the rooms; and identifying andmarking a position of a door and/or a window, including identifying theposition of the door and/or the window on the indoor image by using adeep learning method, or determining the position of the door by findingwhere a room outline is crossed by the track of the tracking map fromcapturing the first images of multiple rooms of the same property.

Further, the 2D floorplan generation unit performs a correction on the2D floorplans of the multiple rooms, including correcting wall linedirections of all rooms by using a statistical method, so that walllines of all rooms are aligned in the same direction if they wereparallel within a specific error range; and when assembling the 2Dfloorplans of the rooms, the 2D floorplan generation unit corrects oneor more overlapping parts and/or gaps.

Further, the photography-based 3D modeling system according to thepresent disclosure can also include a 2D floorplan generation unit,configured to generate a 2D floorplan in the following ways: projectingeach surface of the overall 3D model generated by the 3D modelassembling unit onto a plane parallel to the floor, and merging theseprojections into one or more polygons; correcting and simplifying theobtained polygon(s), including at least one of the following methods:(1) retaining only main vertices of the polygon and deleting smallconcave or convex rectangles; and (2) using a computer vision algorithmto detect straight lines in the picture, and then determining thedirection of a wall, and aligning edges that are approximately parallelor perpendicular to the direction of the wall to correspondingdirections; and identifying and marking a position of a door and/or awindow, including identifying the position of the door and/or the windowon the indoor image by using a deep learning method, or determining theposition of the door by finding where a room outline is crossed by thetrack of the tracking map from capturing the first images of multiplerooms of the same property.

In addition, the present disclosure further provides an automatic 3Dmodeling apparatus, including: a 3D model generation unit, configuredto: based on a first image of each of multiple spaces included in amodeling object, generate a 3D model of each space; and a 3D modelassembling unit, configured to: based on position and capture directioninformation when the first image of each of the multiple spaces iscaptured, assemble the 3D models of the spaces generated by the 3D modelgeneration unit in the global three-dimensional coordinate system, togenerate an overall 3D model from the individual 3D models of thespaces.

In addition, the present disclosure further provides an automatic 3Dmodeling method, including: a 3D model generation step: based on a firstimage of each of multiple spaces included in a modeling object,generating a 3D model of each space; and a 3D model assembling step:based on position and capture direction information when the first imageof each of the multiple spaces is captured, assembling the 3D models ofthe spaces generated in the 3D model generation step in the globalthree-dimensional coordinate system, to generate an overall 3D modelfrom the individual 3D models of the spaces.

In addition, the present disclosure further provides a photography-based3D modeling method, including the following steps: (S1) attaching amobile device with a photo capture function and a camera onto the samecamera stand; (S2) obtaining multiple second images from the camera orthe mobile device during movement of the stand, and obtaining a positionand a capture direction of each photo capture point by optionally usingone or more sensors of the camera or the mobile device, to build atracking map that uses a global coordinate system; (S3) generating 3Dmodels on the mobile device or a remote server based on a first imagecaptured at each photo capture point; and (S4) placing the individual 3Dmodels of all photo capture points in the global three-dimensionalcoordinate system based on the position and the capture directionobtained in S2, and connecting the individual 3D models of multiplephoto capture points to generate an overall 3D model that includesmultiple photo capture points.

Further, step S2 uses a positioning system of the mobile device or thecamera and performs feature point matching based on second imagescaptured by the mobile device or the camera at adjacent photo capturepoints, to identify relative displacement and capture directioninformation of the photo capture points, in order to build a trackingmap that includes all photo capture points in the same coordinate systemand provides a position and a direction of each photo capture point.

Further, step S2 further includes correcting the tracking map fromobtaining information that includes acceleration, velocity, anddirection of movement by using one or more sensors of the mobile deviceor the camera.

Further, step S2 further includes obtaining an angle between a capturedirection of a lens of the camera and a capture direction of the mobiledevice, where at an initialization stage, the positioning system basedon the mobile device and the positioning system based on the camera runsimultaneously, and the camera stand is moved by a specific distance; insuch case, the two systems each provide one displacement vector, and anangle between the two vectors is the angle between the capture directionof the lens of the camera and the capture direction of the mobiledevice; an angle consistent with the capture direction of the mobiledevice is specified by manually rotating a preview image or a capturedimage of the camera; preview images or captured images of the mobiledevice and the camera are matched by using an image recognitionalgorithm, to identify the angle; or an additional mark is used(including adding a mark to the stand to form a known fixed angle with amounting direction of the mobile device), and then the mark isidentified in the preview image or the image of the camera, so as tocalculate the angle between the capture direction of the lens of thecamera and the capture direction of the mobile device.

Further, step S3 includes: (S31) identifying one or more image areas ofat least one of a floor, a ceiling, and a wall in the image based on adeep learning method; and (S32) dividing the identified image area(s)into blocks based on an image processing technology, where each block isapproximately considered as one plane, image blocks of the floor and theceiling are located on a horizontal plane, and an image block of thewall is located on a vertical plane; and generating the 3D model bysolving an equation for each plane, where for two planes that intersectin the image, an intersecting line of the two planes is used as aconstraint, so that an error between a calculated intersecting line andan actually observed intersecting line is minimized.

Further, step S3 further includes: using a computer vision algorithm toidentify wall corners in an indoor image, and connecting the wallcorners to generate a rough model of a room.

Further, step S4 includes: (S41) converting local coordinates of a 3Dmodel of a single photo capture point into global coordinates, forexample, by using a transformation matrix based on the position and thecapture direction of each photo capture point, so as to obtain anoverall 3D model of all photo capture points; (S42) performs acorrection on the 3D models of multiple photo capture points, includingcorrecting wall line directions of all photo capture points by using astatistical method, so that wall lines of all rooms are aligned in thesame direction if they were parallel within a specific error range; and(S43) when assembling the 3D models of the photo capture points,correcting one or more overlapping parts and/or gaps.

In comparison with existing technologies, the present disclosure canachieve one or more of the following beneficial effects: multiple photocapture devices are supported; tasks such as 3D modeling and assemblingcan be executed on both a device with limited computing capability, suchas a mobile device, and a remote server; 3D models of various photocapture points can be automatically assembled based on an obtainedrelative position of each photo capture point and obtained capturedirection information of a camera lens, to generate an overall 3D model;and a 2D floorplan can also be generated as needed. The presentdisclosure achieves high success rate for 3D model generation; needs asfew as only one panoramic image for each room, is highly efficient withgood user experience; achieves high modeling efficiency by supportingboth rapid modeling during photo capture and accurate modeling on aremote server; provides a WYSIWYG (what you see is what you get)experience, as a result a user can select a new photo capture point byreferring to a result of rapid modeling, so as to prevent any missedphoto captures; and avoids interference from objects such as furniture,helping generate accurate floorplans.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 is an architectural diagram illustrating an example system towhich the present disclosure can be applied.

FIG. 2 is a schematic structural diagram illustrating an implementationof a photography-based 3D modeling system according to the presentdisclosure.

FIG. 3 is a schematic structural diagram illustrating anotherimplementation of a photography-based 3D modeling system according tothe present disclosure.

FIG. 4 is a schematic flowchart illustrating an implementation of aphotography-based 3D modeling method according to the presentdisclosure.

FIG. 5 is a schematic structural diagram illustrating an implementationof an automatic 3D modeling apparatus according to the presentdisclosure.

FIG. 6 is a schematic structural diagram illustrating anotherimplementation of an automatic 3D modeling apparatus according to thepresent disclosure.

FIG. 7 is a schematic flowchart illustrating an implementation of anautomatic 3D modeling method according to the present disclosure. and

FIG. 8 is a schematic structural diagram illustrating an implementationof an electronic device according to the present disclosure.

With reference to the accompanying drawings and specificimplementations, the above and other features, advantages and aspects ofimplementations of the present disclosure become clearer. Same orsimilar reference numerals in the accompanying drawings represent sameor similar elements. It should be understood that the accompanyingdrawings are examples, components and elements are not necessarily drawnto scale.

DETAILED DESCRIPTION

Unless otherwise defined, the technical and scientific terms used inthis specification have the same meanings as those commonly understoodby a person skilled in the art of the present disclosure. The terms usedin the specification of the present application are merely intended forthe purpose of describing the specific implementations, but not intendedto limit the present disclosure. The terms “include” and “have” and anyother variants thereof in the specification, the claims, and theaccompanying drawings of the present disclosure are intended to covernon-exclusive inclusion. In the specification and the claims, or theaccompanying drawings of the present disclosure, the terms “first”,“second”, and the like are intended to distinguish between differentobjects but do not indicate a particular order.

Mentioning an “implementation” in the specification means that aparticular characteristic, structure, or feature described withreference to the implementation can be included in at least oneimplementation of the present disclosure. The word appearing in variouslocations in the specification does not necessarily refer to the sameimplementation, and is not an independent or alternate implementationexclusive of other implementations. It is explicitly and implicitlyunderstood by a person skilled in the art that the implementationsdescribed in the specification can be combined with anotherimplementation.

To make a person skilled in the art understand the solutions in thepresent disclosure better, the following further describes the presentdisclosure with reference to the accompanying drawings and theimplementations.

System Structure

A system structure in an implementation of the present disclosure isfirst described. As shown in FIG. 1, a system structure 100 can includemobile devices 101, 102, 103, and 104, a network 105, and a server 106.The terminal devices 101, 102, 103, and 104 and the server 106 areconnected to one another via the network 105.

In the present implementation, the mobile device 101, 102, 103, or 104shown in FIG. 1 can transmit various information through the network105. The network 105 can include various connection types, such as wiredand wireless communication links, or fiber optic cables. It should benoted that, the above wireless connection methods can include but arenot limited to a 3G/4G/5G connection, a Wi-Fi connection, a Bluetoothconnection, a WiMAX connection, a Zigbee connection, a UWB connection, alocal area network (“LAN”), a wide area network (“WAN”), an internetwork(for example, the Internet), an end-to-end network (for example, ad hocend-to-end network), and other network connection methods that arecurrently known or will be developed in the future. The network 105 cancommunicate using any network protocol that is currently known or willbe developed in the future, such as the Hyper Text Transfer Protocol(HTTP), and can interconnect with digital and data communication, forexample, a communications network, of any form or medium.

A user can use the mobile devices 101, 102, 103, and 104 to interactwith the server 106 via the network 105, to receive or send messages,etc. Various client applications can be installed on the mobile device101, 102, 103, or 104, such as live video and playback applications, webbrowser applications, shopping applications, search applications,instant messaging tools, email clients, social platforms software, etc.

The mobile device 101, 102, 103, or 104 may be any electronic devicethat has a touchscreen and/or supports web browsing, and has a photocapture function, including but not limited to mobile terminals such asa smartphone, a tablet computer, an e-book reader, a moving pictureexperts group audio layer-3 (MP3) player, a moving picture experts groupaudio layer-4 (MP4) player, a head-mounted display device, a notebookcomputer, a digital broadcast receiver, a personal digital assistant(PDA), a portable multimedia player (PMP) and an in-vehicle terminal, aswell as a digital TV, a desktop computer, etc.

The server 106 may be a server that provides various services, such as aback-end server that supports 3D modeling on the mobile device 101, 102,103, or 104.

It should be understood that the quantities of mobile devices, networks,and servers in FIG. 1 are merely examples. Depending on implementationneeds, there can be any quantities of mobile devices, networks, andservers.

Herein, the mobile device can be attached to a stand, such as a tripod.independently or jointly with another electronic terminal device such asa camera, to cooperate with applications running in the Android systemto implement the implementation method in the present disclosure, or tocooperate with applications running in other operating systems such asthe iOS system, the Windows system, and HarmonyOS to implement theimplementation method in the present disclosure.

Photography-Based 3D Modeling System

FIG. 2 is a schematic structural diagram illustrating an implementationof a photography-based 3D modeling system according to the presentdisclosure. As shown in FIG. 2, the photography-based 3D modeling systemin the present implementation includes: a photo capture unit 201,configured to capture a first image of each of multiple spaces. Herein,the first image may be, for example, an image used for 3D modeling,including an ordinary photo, a panoramic photo, and a processed (forexample, undistorted) panoramic photo. The photo capture unit 201 can beimplemented by a photo capture module in the mobile device.

Herein, the photo capture unit 201 can capture multiple second imageswhen moving among the spaces. Herein, the second images may be, forexample, images used for positioning, including an ordinary photo, apanoramic photo, and a processed (for example, undistorted) panoramicphoto. Herein, the first image and the second image may be the sameimage, partially identical images, or different images, which is notlimited. The image used for positioning herein may also be a photo, apreview image, a video frame, etc., captured by the photo capture unit201, and may be stored or may be not stored but used only to identifyand match feature points.

Herein, for example, the photo capture unit 201 has a positioning sensorand a direction sensor, and can obtain positioning information anddirection information when capturing an image used for 3D modeling ofthe space in which the photo capture unit 201 is located. Here, thepositioning sensor may be, for example, one or more of an accelerationsensor, a gyroscope, a linear acceleration sensor, an angular velocitysensor, a gravity sensor, and the like. The direction sensor may be, forexample, one or more of a direction sensor, a magnetic sensor, and thelike.

A 3D model generation unit 202 is configured to generate a 3D model ofeach space based on the image used for 3D modeling that is captured bythe photo capture unit 201 for each space.

In one or more implementations, for example, the photo capture unit 201has binocular lenses, and the binocular lenses separately capture theimages used for 3D modeling at the same photo capture point; and the 3Dmodel generation unit 202 compares the images used for 3D modeling thatare captured by the binocular lenses, determines corresponding pixels,and obtains depth information of each corresponding pixel, so as togenerate the 3D model.

Certainly, in one or more implementations, for example, the 3D modelgeneration unit 202 can further predict a depth of each pixel or depthsof some pixels in the image used for 3D modeling by using a deeplearning method, and calculate a normal direction of each pixel ornormal directions of some pixels or predict the normal direction of eachpixel or the normal directions of some pixels by directly using the deeplearning method, so as to generate a 3D model of each space.

Herein, in one or more implementations, the method for predicting thedepth of each pixel in the image used for 3D modeling or predicting thenormal direction of each pixel by using the deep learning method may be,for example, a method for training a plane-aware convolutional neuralnetwork by predicting a dense depth, a surface normal, and a planeboundary from a single indoor 360° image (for example, refer to PanoPopups: Indoor 3D Reconstruction with a Plane-Aware Network); or amethod for predicting a depth from a 360° image through end-to-endlearning by using a large-scale three-dimensional dataset, for example,using an approach as described in OmniDepth: Dense Depth Estimation forIndoors Spherical Panoramas or other suitable approaches.

A capture position acquisition unit 203 is configured to obtain positionand capture direction information of the photo capture unit 201 whencapturing the image used for 3D modeling of each space, and certainlycan further obtain a focal length of the lens, a scanning interval ofthe lens, and other parameters that can affect image content capture,for example, settings for a focal length, a wide-angle lens, or atelephoto lens. If these parameters are incorrect, identification orrelative sizes of image content features may be incorrect).

Herein, for example, the capture position acquisition unit 203 canperform feature point matching based on images at adjacent photo capturepoints among the multiple images used for positioning that are capturedby the photo capture unit 201, to obtain relative displacement andcapture direction information of each photo capture point, for example,can build a tracking map that includes all photo capture points in thesame coordinate system, so as to obtain position and capture directioninformation of the photo capture unit 201 when capturing the image usedfor 3D modeling of the space in which the photo capture unit 201 islocated.

Herein, for example, the capture position acquisition unit 203 canfurther obtain, based on positioning information and directioninformation provided by the photo capture unit 201 when capturing animage used for 3D modeling of a space in which the photo capture unit201 is located, position and capture direction information of the photocapture unit 201 when capturing the image used for 3D modeling of thespace in which the photo capture unit 201 is located.

Herein, the capture position acquisition unit 203 further corrects thetracking map formed by relative displacement and capture directioninformation based on displacement information such as accelerationinformation and velocity information or other action/motion informationprovided by sensors of the photo capture unit 201, including adisplacement sensor such as an acceleration sensor or a velocity sensor,and a gyroscope, a barometric pressure sensor or another motion sensor.

A 3D model assembling unit 204 is configured to: based on the positionand capture direction information of each space obtained by the captureposition acquisition unit 203, assemble the 3D models of the spacesgenerated by the 3D model generation unit 202 in the globalthree-dimensional coordinate system, to generate an overall 3D modelfrom the individual 3D models of the spaces.

Herein, the 3D model assembling unit 204 can further convert localcoordinates of the 3D model of a single room into global coordinates,for example, by using a transformation matrix based on the position andcapture direction information obtained by the capture positionacquisition unit 203 when each room is captured, so as to obtain theoverall 3D model of all photo capture points.

Herein, the method for converting local coordinates of the 3D model of asingle room into global coordinates includes: enabling the photo captureunit 201 to move a predetermined distance, and obtaining, by the captureposition acquisition unit 203, coordinates of two endpoints of thepredetermined distance (for example, one meter), where a ratio of adifference between the coordinates of the two endpoints to thepredetermined distance is the scale of the local coordinates to theglobal coordinates; or estimating, by using a feature point identifiedby the capture position acquisition unit 203, a ratio of a height of aplane on which a floor or a ceiling of the space is located to a heightof the photo capture unit 201, to obtain the scale of the localcoordinates to the global coordinates. Before performing photo captureat a first photo capture point or during movement of subsequent photocapture, the photo capture unit 201 moves a predetermined distance toobtain a predetermined quantity of the feature points.

Herein, for example, the method for estimating the ratio of the heightof the plane on which the floor or the ceiling of the space is locatedto the height of the photo capture unit 201 is projecting the photocapture point vertically onto the floor plane, and then connecting thefeature points, e.g., on the floor, so that these three points form atriangle. Assume that the projection line is L1, the line from the photocapture point to the feature point is L2, and the line from theprojection point to the feature point is L3. The angle between L1 and L2is known, e.g., based on the characteristics of the panoramic image, L1can be calculated by using a trigonometric function based on a length ofL3 and the above angle, and a scale is calculated based on an actualheight of the camera.

Herein, the predetermined distance needs to satisfy a sufficientdistance to obtain a predetermined quantity of feature points.

Specifically, in one or more implementations, for example, the photocapture unit 201 uses a camera or a mobile phone camera only. Becauseobtained coordinates are all relative values, the coordinates need to beconverted into absolute values. In other words, an image comparisonalgorithm usually has no accurate scale. The coordinates are relativeand have no specific size. As a result, displacement and scalescalculated from different pictures are inconsistent, causingmisalignment. During actual implementation, the above method forconverting the coordinates may be as follows:

(a) making a user move a specified distance, for example, one meter, andobtaining coordinates of two endpoints of the movement distance, where aratio of a difference between the coordinates of the two endpoints tothe movement distance is the scale of local coordinates to globalcoordinates; and

(b) estimating, based on a feature point identified by the system, aplane on which a floor or a ceiling of a room is located. Assume that avertical coordinate axis in the coordinate system is a z-axis, and anequation of the plane is z=a. Because the height of the photo captureunit 201 is known, or a height from the photo capture unit 201 to aceiling is known, which is h, a/h is the scale of the local coordinatesto the global coordinates. Herein, because a specific quantity offeature points on the same plane, e.g., floor or ceiling, need to beidentified to estimate a value of a, an initialization process can beused during implementation, that is, moving a sufficiently longdistance, for example, more than two meters, so that adequate featurepoints can be accumulated in different environments. The initializationprocess can be performed prior to the first photo capture point. If theinitialization fails, it can be performed again without affectingsubsequent photo capture. Alternatively, the initialization process canbe performed during movement among subsequent photo capture points.

In the present implementation, for example, the photo capture unit 201can be implemented by a camera and/or a mobile phone with a photocapture function.

In one or more implementations, for example, the camera and the mobilephone with a photo capture function for implementing the photo captureunit 201 can be attached to attached to the same camera stand; andduring movement of the stand, multiple images used for positioningcaptured by the camera or the mobile phone with a photo capture functionare obtained, so as to obtain position and capture direction informationof the camera or the mobile phone with a photo capture function whencapturing the image used for 3D modeling of the space in which thecamera or the mobile phone is located.

Herein, based on a positioning system of the camera or the mobile phonewith a photo capture function, the images used for positioning capturedby the camera or the mobile phone with a photo capture function can befurther used, and feature point matching can be performed based onimages used for positioning at adjacent photo capture points to obtainrelative displacement and capture direction information of each photocapture point, thereby providing a relative position and direction ofeach photo capture point.

In one or more implementations, because a position, a direction and atracking map of the photo capture point are obtained through the mobilephone, and because the camera can be attached to the top of the camerastand by using a screw, the angle between the camera and the mobilephone may be different for each mounting, but the angle remainsunchanged during the photo capture of a house. The 3D model of anindividual room needs to be rotated by this angle, and then put into theglobal coordinates based on a position and a capture direction obtainedby the mobile phone, to generate an overall 3D model.

Herein, before capturing the image used for 3D modeling of the firstspace or during movement of subsequent photo capture, the photo captureunit 201 can obtain an angle between a capture direction of a lens ofthe camera and a capture direction of the mobile phone by using one ormore of the following methods:

herein, the capture direction of the lens of the camera may be adirection of one of two fisheye lenses, e.g., front and rear, of acommon panoramic camera, or may be a direction of a lens for capturingthe first photo by a panoramic camera that captures multiple photos byrotating one lens;

(1) simultaneously running a positioning system based on the mobilephone and a positioning system based on the camera, and moving the standby a specific distance; in such case, the two systems each provide onedisplacement vector, and an angle between the two vectors is the anglebetween the capture direction of the lens of the camera and the capturedirection of the mobile phone;

(2) specifying an angle consistent with the capture direction of themobile phone by manually rotating a preview image or a captured image ofthe camera;

(3) matching preview images or captured images of the mobile phone andthe camera by using an image recognition algorithm, to identify theangle; herein, a possible implementation method for identifying theangle may include at least one of the following ways:

calculating feature points in the images captured by the mobile phoneand the camera. For example, use scale-invariant feature transform(SIFT) to find a position difference of the matching feature points inthe two images, in order to calculate the angle between capturedirections of two lenses; or

building visual simultaneous localization and mapping (VSLAM) systemsrespectively by using video streams captured by the two lenses, wherethe angle between displacement of the cameras in the two systems is theangle between the capture directions of the lenses;

(4) using an additional mark (including adding a mark to the stand toform a known fixed angle with a mounting direction of the mobile phone),and then identifying the mark in the preview image or the image of thecamera, so as to calculate the angle between the capture direction ofthe lens of the camera and the capture direction of the mobile phone;and

(5) using a camera installation interface on the stand so that a knownfixed angle is formed between the camera and the mobile phone (mobiledevice).

Certainly, herein, the position, the direction and the tracking map ofthe photo capture point can also be calculated from the camera images.In such case, the calculation of the 3D model does not depend on theangle between the camera and the mobile phone. In this case, the mobilephone does not need to be attached to the stand.

Herein, if the camera also has a direction sensor, the angle can becalculated by directly obtaining the directions of the camera and themobile phone.

The 3D model generation unit 202 is implemented by the mobile phone orby a remote server; when being implemented by the remote server, the 3Dmodel generation unit receives, through a network, one or more imagesused for 3D modeling, and/or one or more images used for positioningthat are captured and sent by the camera and/or the mobile phone with aphoto capture function, and/or information obtained by one or moremotion sensors, to generate a 3D model of each space.

For example, the capture position acquisition unit 203 can beimplemented by the camera or the mobile phone.

For example, the 3D model assembling unit 204 can be implemented by themobile phone or by a remote server; when being implemented by the remoteserver, the 3D model assembling unit 204 receives, through a network,the position and capture direction information of each space sent by thecapture position acquisition unit 203, completes the assemblingprocessing based on the position and capture direction information, andsends the generated overall 3D model to the mobile phone or anotherdevice. FIG. 3 is a schematic structural diagram illustrating anotherimplementation of a photography-based 3D modeling system according tothe present disclosure. As shown in FIG. 3, in the photography-based 3Dmodeling system in the present implementation, for example, aphotography-based 3D modeling space is a room, and an image used for 3Dmodeling is an indoor image of the room. The photography-based 3Dmodeling system includes the following:

a photo capture unit 301, configured to capture an image used for 3Dmodeling of each of multiple rooms.

Herein, the photo capture unit 301 can capture multiple images used forpositioning when moving among the rooms.

Herein, for example, the photo capture unit 301 has a positioning sensorand a direction sensor, and can obtain positioning information anddirection information when capturing an image used for 3D modeling ofthe room in which the photo capture unit 301 is located.

A 3D model generation unit 302 is configured to generate a 3D model ofeach room based on the image used for 3D modeling that is captured bythe photo capture unit 301 for each room.

Herein, the 3D model generation unit 302 identifies one or more imageareas of at least one of a floor, a ceiling, and a wall in the imageused for 3D modeling based on a deep learning method; divides theidentified image area(s) into blocks based on an image processingtechnology, where each block is approximately considered as one plane,image blocks of the floor and the ceiling are located on a horizontalplane, and an image block of the wall is located on a vertical plane;and generates the 3D model by solving an equation for each plane, wherefor two planes that intersect in the image used for 3D modeling, anerror between a calculated intersecting line and an actually observedintersecting line is minimized.

Herein, the 3D model generation unit 302 further uses a computer visionalgorithm to identify wall corners in the indoor image and connect thewall corners to generate a rough model of the room.

Herein, in one or more implementations, for example, the method foridentifying wall corners in the image may be using the self-supervisedtraining framework of interest point detection and description, forexample, using an approach as described in SuperPoint: Self-SupervisedInterest Point Detection and Description or other suitable approaches,and then connecting the wall corners to generate a rough model of theroom, so as to capture a geometric relationship between objects such aswall corners that frequently appear in the same three-dimensional spacestructure.

A capture position acquisition unit 303 is configured to obtain positionand capture direction information of the photo capture unit 301 whencapturing the image used for 3D modeling of each room.

Herein, for example, the capture position acquisition unit 303 canperform feature point matching based on images at adjacent photo capturepoints among the multiple images used for positioning that are capturedby the photo capture unit 301, to obtain relative displacement andcapture direction information of each photo capture point, for example,can build a tracking map that includes all photo capture points in thesame coordinate system, so as to obtain position and capture directioninformation of the photo capture unit 301 when capturing the image usedfor 3D modeling of the room in which the photo capture unit 301 islocated.

Herein, for example, the capture position acquisition unit 303 canfurther obtain, based on positioning information and directioninformation provided by the photo capture unit 301 when capturing animage used for 3D modeling of a room in which the photo capture unit 301is located, position and capture direction information of the photocapture unit 301 when capturing the image used for 3D modeling of theroom in which the photo capture unit 301 is located.

Herein, the capture position acquisition unit 303 further corrects thetracking map based on acceleration information and velocity informationprovided by an acceleration sensor and a velocity sensor of the photocapture unit 301.

A 3D model assembling unit 304 is configured to: based on the positionand capture direction information of each room obtained by the captureposition acquisition unit 303, assemble the 3D models of the roomsgenerated by the 3D model generation unit 302 in the globalthree-dimensional coordinate system, to generate an overall 3D modelfrom the individual 3D models of the rooms.

Herein, the 3D model assembling unit 304 can further convert localcoordinates of the 3D model of a single room into global coordinates,for example, by using a transformation matrix based on the position andcapture direction information obtained by the capture positionacquisition unit 303 when each room is captured, so as to obtain theoverall 3D model of all photo capture points.

Herein, the 3D model assembling unit 304 can perform a correction on 3Dmodels of the multiple rooms, including correcting wall line directionsof all rooms by using a statistical method. For indoor scenes, in mostcases, walls of each room meet the parallel and vertical relationships.By finding an average or median of the wall line directions of eachroom, or using algorithms such as Random Sample Consensus (RANSAC) toidentify the most possible wall line direction, the rooms with errorswithin a specific range are adjusted to the same direction, so that walllines of all rooms are made parallel if they were within a specificerror range prior to correction.

Herein, when assembling the 3D models of the rooms, the 3D modelassembling unit 304 can further correct one or more overlapping partsand/or gaps. Herein, the correction method may include at least one ofthe following ways:

Assuming that the position of the room is accurate, but there is anerror in outline recognition, the overlapping part is trimmed and thegap is filled.

Assuming that the outline of the room is recognized accurately, butthere is an error in the position, the position of each room is moved toeliminate the overlap and the gap as far as possible.

Certainly, in practice, the two methods can be performed repeatedly anditeratively to get close to the real situation.

A 2D floorplan generation unit 305 is configured to generate a 2Dfloorplan in the following ways:

1. projecting each surface of the generated 3D model onto a planeparallel to the floor, and merging these projections into a polygon;

2. correcting and simplifying the obtained polygon, which may include,for example, the following methods:

(1) retaining only main vertices of the polygon and deleting smallconcave or convex rectangles; for example, concave or convex rectanglesless than the standard wall thickness, e.g., 12 cm or 24 cm, can bedeleted; and

(2) using a computer vision algorithm to detect straight lines in thepicture, and then determining the direction of a wall, and aligningedges that are approximately parallel or perpendicular to the directionof the wall to corresponding directions;

certainly, the obtained polygon can be corrected and simplified in otherways, which is not limited to the above approaches;

3. assembling the generated 2D floorplans of the rooms in the sametwo-dimensional coordinate system based on the position and capturedirection information of each room obtained by the capture positionacquisition unit 303, to generate an overall 2D floorplan from theindividual 2D floorplans of the rooms; and

4. identifying and marking a position of a door and/or a window,including identifying the position of the door and/or the window on theindoor image by using a deep learning method, or determining theposition of the door by finding where a room outline is crossed by thetrack of the tracking map from capturing the first images of multiplerooms of the same property by the photo capture unit 301.

Herein, in one or more implementations, for example, the method foridentifying the position of the door and/or the window on the indoorimage by using the deep learning method may be detecting each targetobject such as the door and/or the window by using YOLO (You Only LookOnce: Unified, Real-Time Object Detection).

Herein, the 2D floorplan generation unit 305 can further correct 2Dfloorplans of the multiple rooms, including correcting wall linedirections of all the rooms by using a statistical method, so that walllines of all the rooms are aligned in the same direction if they wereparallel within a specific error range. Herein, the uniform correctionmethod may be the same as that described above, and details are omittedfor simplicity.

Herein, when assembling the 2D floorplans of the rooms, the 2D floorplangeneration unit 305 can further correct one or more overlapping partsand/or gaps.

Herein, the 2D floorplan generation unit can further generate a 2Dfloorplan in the following ways:

1. projecting each surface of the overall 3D model generated by the 3Dmodel assembling unit 304 onto a plane parallel to the floor, andmerging these projections into one or more polygons;

2. correcting and simplifying the obtained polygon(s), which mayinclude, for example, the following methods:

(1) retaining only main vertices of the polygon and deleting smallconcave or convex rectangles; and

(2) using a computer vision algorithm to detect straight lines in thepicture, and then determining the direction of a wall, and aligningedges that are approximately parallel or perpendicular to the directionof the wall to corresponding directions;

certainly, the obtained polygon can be corrected and simplified in otherways, which is not limited to the above approaches; and

3. identifying and marking a position of a door and/or a window,including identifying the position of the door and/or the window on theindoor image by using a deep learning method, or determining theposition of the door by finding where a room outline is crossed by thetrack of the tracking map from capturing the first images of multiplerooms of the same property by the photo capture unit 301.

Herein, in one or more implementations, for example, the method foridentifying the position of the door and/or the window on the indoorimage by using the deep learning method may be YOLO (You Only Look Once:Unified, Real-Time Object Detection).

Photography-Based 3D Modeling Method

FIG. 4 is a schematic flowchart illustrating a photography-based 3Dmodeling method according to the present disclosure.

Referring to FIG. 4, the photography-based 3D modeling method providedin the present disclosure includes the following steps:

(S1) attaching a mobile device (including a mobile phone, a tabletcomputer, etc.) with a photo capture function and/or a camera (includinga panoramic camera, a fisheye camera, and an ordinary digital camera) tothe same camera stand (including a tripod).

(S2) Obtaining multiple images used for positioning from the camera orthe mobile device during movement of the stand, and obtaining a positionand a capture direction of each photo capture point by using an imageprocessing algorithm and/or one or more sensors of the camera or themobile device, to build a tracking map that uses a global coordinatesystem.

Herein, step S2 uses a positioning system of the mobile device or thecamera and performs feature point matching based on second imagescaptured by the mobile device or the camera at adjacent photo capturepoints, to identify relative displacement and capture directioninformation of the photo capture points, in order to build a trackingmap that includes all photo capture points in the same coordinate systemand provides a position and a direction of each photo capture point.

Herein, step S2 further includes correcting the tracking map fromobtaining information that includes acceleration, velocity, anddirection of movement by using one or more sensors of the mobile deviceor the camera.

Herein, step S2 further includes obtaining an angle between a capturedirection of a lens of the camera and a capture direction of the mobiledevice, where at an initialization stage, the positioning system basedon the mobile device and the positioning system based on the camera runsimultaneously, and the stand is moved by a specific distance; in suchcase, the two systems each provide one displacement vector, and an anglebetween the two vectors is the angle between the capture direction ofthe lens of the camera and the capture direction of the mobile device;an angle consistent with the capture direction of the mobile device isspecified by manually adjusting the camera and the mobile device toangles with consistent orientation, for example, by rotating a previewimage or a captured image of the camera; preview images or capturedimages of the mobile device and the camera are matched by using an imagerecognition algorithm, to identify the angle; or an additional mark isused (including adding a mark to the stand to form a fixed angle with amounting direction of the mobile device), and then the mark isidentified in the preview image or the image of the camera, so as tocalculate the angle between the capture direction of the lens of thecamera and the capture direction of the mobile device.

(S3) Generating 3D models on the mobile device or a remote server byusing a deep learning algorithm or other methods based on an image usedfor 3D modeling that is captured at each photo capture point, to obtaina 3D model and/or a 2D floorplan of each photo capture point.

Herein, step S3 includes the following:

(S31) identifying one or more image areas of at least one of a floor, aceiling, and a wall in the image based on a deep learning method; and

(S32) dividing the identified image area(s) into blocks based on animage processing technology, where each block is approximatelyconsidered as one plane, image blocks of the floor and the ceiling arelocated on a horizontal plane, and an image block of the wall is locatedon a vertical plane; and generating the 3D model by solving an equationfor each plane, where for two planes that intersect in the image, anintersecting line of the two planes is used as a constraint, so that anerror between a calculated intersecting line and an actually observedintersecting line is minimized.

Herein, step S3 further includes: using a computer vision algorithm toidentify wall corners in an indoor image, and connecting the wallcorners to generate a rough model of a room. Herein, in one or moreimplementations, for example, the method for identifying wall corners inthe image may be using the training framework of self-supervisedinterest point detection and description, for example, using an approachdescribed in SuperPoint: Self-Supervised Interest Point Detection andDescription or other suitable approaches, and then connecting the wallcorners to generate a rough model of the room, so as to capture ageometric relationship between objects such as wall corners thatfrequently appear in the same three-dimensional space structure.

(S4) Placing the individual 3D models of all photo capture points in theglobal three-dimensional coordinate system based on the position and thecapture direction obtained in S2; connecting individual 3D models ofmultiple photo capture points to generate an overall 3D model and/or 2Dfloorplan of the multiple photo capture points; and correcting walldirections of all rooms and optimizing the overlap(s) and gap(s). Inpopular room types, rooms are usually composed of parallel walls,however, when generating a room model generated from a single photocapture point, wall that are actually parallel may have an error intheir directions (non-parallel); by considering the wall directions ofmultiple rooms, a uniform direction is identified and the walldirections of all rooms are adjusted based on the uniform direction.

Herein, step S4 includes the following:

(S41) converting local coordinates of a 3D model of a single photocapture point into global coordinates, for example, by using atransformation matrix based on the position and the capture direction ofeach photo capture point, so as to obtain an overall 3D model of allphoto capture points;

(S42) performs a correction on the 3D models of multiple photo capturepoints, including correcting wall line directions of all photo capturepoints by using a statistical method, so that wall lines of all roomsare aligned in the same direction if they were parallel within aspecific error range; and

(S43) when assembling the 3D models of the photo capture points,correcting one or more overlapping parts and/or gaps.

(S5) automatically generating a virtual roaming effect between panoramicimages on the mobile device.

The following describes application of the photography-based 3D modelingmethod in the present implementation with reference to thephotography-based 3D modeling system.

I. Hardware System

In the present implementation, the mobile phone and the camera areattached to the same stand (including a tripod, etc.).

II. System Initialization

In the present disclosure, one of the following two methods is used toobtain the capture position of each photo capture point and the capturedirection of the camera:

Method (1): Based on the positioning system of the mobile phone, thatis, using the images (photos, videos or preview images) of the mobilephone, feature point matching is performed based on images at adjacentphoto capture points to identify displacement of the photo capturepoints, and the sensors (including a gyroscope, an accelerometer, acompass, or other inertial sensors, etc.) of the mobile device arepreferably used for correction, so as to build a tracking map andprovide positions and directions of the photo capture points.

Method (2): Based on the positioning system of the camera, that is,using the images (photos, videos or preview images) of the camera,feature point matching is performed based on images at adjacent photocapture points to identify displacement of the photo capture points;preferably, continuous feature matching and positioning are performedwith photo capture points centimeters or decimeters apart, withcorrections done using sensor data (such as a gyroscope, anaccelerometer, a compass, etc.) of the camera, so as to build a trackingmap and provide positions and directions of the photo capture points.

Comparison of the two methods: Method (1) is based on the mobile phonesystem. Because the mobile phone has multiple sensors, it can oftenprovide absolute coordinate information that is relatively accurate, andcan measure an absolute distance between the photo capture points.However, this method requires an additional initialization process priorto usage.

In method (2), because the camera often does not have good built-insensors, it can provide only relative coordinates of the captureposition. It does not require additional initialization to align thecoordinate axis of the 3D model of a single photo capture point with thetrack generated; in addition, the capture path comes around to form aloop, this method may provide smaller positioning errors.

When method (1) is used, the coordinates provided by the mobile phoneare based on the local coordinate system of the mobile phone (generally,one axis points in a direction pointing perpendicularly to the ground,and the other two axes point in the front-rear and left-rightdirections, respectively). However, the coordinate system of the 3Dmodel generated based on panoramic photos is based on the coordinatesystem of the camera. The coordinate axes of the mobile phone and thecamera do not align with each other. To solve this problem, the systemneeds to be initialized manually or automatically. A manual or automaticmethod can be used:

Manual method: A user uses an additional measurement tool or adds a markon a device such as the stand, or manually enters an angle between thecapture direction of the lens of the camera and the capture direction ofthe mobile phone.

Automatic method: at the initialization stage method (1) and method (2)are performed simultaneously, and the device is moved by a specificdistance, preferably 1 to 3 meters. In such case, the two systems eachcan provide one system displacement vector, and an angle between the twovectors is the angle between the capture direction of the lens of thecamera and the capture direction of the mobile phone.

III. Determining the Position of a Photo Capture Point and the CaptureDirection

After starting running, the above system can provide position andcapture direction information of the photo capture unit.

IV. Generation of a 3D Model for a Single Photo Capture Point

There are two conventional photography-based modeling methods, both ofwhich have obvious disadvantages.

In conventional method (a), a camera that can record depth informationis used to directly generate a 3D model. Such method relies on morecomplex hardware, resulting in higher equipment costs, and usuallyoperated by professional photographers. As a result, this method hasdisadvantages for wide adoption.

In conventional method (b), two photos are captured respectively at twophoto capture points that are close to each other. Preferably,continuous feature matching and positioning are performed with photocapture points centimeters or decimeters apart. Subsequently, Multi ViewStereo (MVS) (for details, refer tohttps://github.com/cdcseacave/openMVS) is utilized for modeling. Theadvantage is that the entire process is fully automatic without manualintervention. However, there are disadvantages.

Disadvantage 1: It is computation intensive, and as a result rapidmodeling cannot be easily achieved on a mobile device. Photos usuallyneed to be uploaded to a server (cloud/PC) to run modeling algorithmsbenefiting from stronger computing capacities.

Disadvantage 2: It is difficult to specify how far photo capture pointsshould be apart from each other. If the photo capture points are toodense, operations become inconvenient and time-consuming. If photocapture points are selected simply based on unobstructed line of sightbetween two adjacent photo capture points or by “feeling right”,modeling may fail, and no warning can be provided for users during photocapture.

To overcome the above disadvantages, the present disclosure uses aninnovative method: To improve the timeliness of model generation and toachieve a WYSIWYG (What You See Is What You Get) experience, 3D modelgeneration typically only include room outlines (wall positions),without including models of furniture and decorations that are notessential to the room structure. To be specific,

i. Areas such as a floor, a ceiling, a wall, and a roof in the image areidentified by using deep learning method. For a plane on which one ofthese areas is located, either its normal direction (as in the case ofthe floor and ceiling) is known or its normal is on a horizontal plane(as in the case of a wall).

ii. The image is divided into blocks by using image processingtechnology, where each block can be approximately considered as oneplane. For a block of the floor, the plane equation is known. Assumingthat the y-axis is pointing up vertically, the equation of the block ofthe floor is y+1=0. For a block of the wall, the plane equation isAx+Cz+D=0. For a block of the ceiling, the plane equation is y+D=0. Forother blocks, the plane equation is Ax+By+Cz+D=0. The process ofgenerating a 3D model is that of solving these plane equations. For twoplanes that intersect in the image, there is an intersecting linevisible in the image. Using the intersecting line as a constraint, theabove equation solving process can be changed into a problem ofminimization, so that for the two planes that intersect, an errorbetween a calculated intersecting line and an actually observedintersecting line is minimized.

iii. Other methods can also be used to model a scene. For example, in anindoor scene, a computer vision algorithm can be combined with deeplearning to identify wall corners in an image, and the wall corners canthen be connected to generate a rough model of a room. Herein, in one ormore implementations, for example, the method for identifying wallcorners in the image may be using the training framework ofself-supervised interest point detection and description (for example,refer to SuperPoint: Self-Supervised Interest Point Detection andDescription), and then connecting the wall corners to generate a roughmodel of the room, so as to capture the geometric relationship betweenobjects such as wall corners that frequently appear in the samethree-dimensional space structure.

iv. A 2D floorplan is generated. After a 3D model of each photo capturepoint is obtained, a floorplan can be further generated. This isespecially useful for applications of indoor scenes where a floorplan isoften desired. The method is as follows:

1. Project each surface of the 3D model onto a 2D top view plane.

2. Merge these projections into a large polygon.

3. Correct and simplify the obtained polygon, which may include, forexample, the following methods:

(a) The obtained polygon usually has a large quantity of points, and thepolygon can be simplified. Only the vertices of the polygon on the 2Dfloorplan are retained, and small concave or convex rectangles aredeleted.

(b) For an indoor scene, a computer vision algorithm can be used todetect straight lines in the picture, and which are then used todetermine the direction of a wall. Edges that are approximately parallelor perpendicular to the direction of the wall are aligned tocorresponding directions.

4. Identify a door and/or a window. For an indoor scene, the door and/orthe window need/needs to be marked on the 2D floorplan by using thefollowing two methods:

(a) The deep learning method is directly used to identify the positionand size of a door and/or a window in a panoramic image.

Herein, in one or more implementations, for example, the method foridentifying the position and the size of the door and/or the window onthe indoor image by using the deep learning method may be YOLO (You OnlyLook Once: Unified, Real-Time Object Detection).

(b) Because the positioning system based on the mobile phone or thecamera not only provides a position and a capture direction of eachphoto capture point, but also has a movement track of the camera for theentire photo capture process. Where the track crosses the room outlinecan positively identify the position of the door.

V. Generation of 3D Models and 2D Floorplans for Multiple Photo CapturePoints

In step 4, a 3D model of each photo capture point is generated.Coordinates of the obtained 3D models are all relative coordinates withrespect to the photo capture points. In order to assemble these modelsand to generate an overall 3D model and a 2D floorplan, first, localcoordinates of a single model are converted into global coordinates, forexample, by using a transformation matrix based on a known position andcapture direction of each photo capture point.

On top of the above, further corrections can be made to the model andthe floorplan.

i. The line directions are often inaccurate when generating the model ofan individual photo capture point. After multiple points are captured,all photo capture points can be corrected collectively by using astatistical method such as Random Sample Consensus (RANSEC) to identifybest line direction, so that wall lines of all the rooms are aligned inthe same direction if they were parallel within a specific error range,that is, small inconsistencies of wall line directions can thus beavoided.

ii. Due to errors introduced in model generation, there may be one ormore overlaps, gaps, etc. when 3D models and 2D floorplans of multiplephoto capture points are placed next to each other. Overlaps can beautomatically removed and gaps can be filled on the 2D floorplan.

VI. Timely Showing of Results

The above process can be performed automatically and entirely on amobile phone. Following completion, 3D models, 2D floorplans and virtualroaming become readily available on the mobile phone, and can beuploaded to the cloud to be shared with others.

VII. Manual Editing

Since errors may be introduced by the positioning system, the 3Dmodeling algorithm for a single photo capture point, and various phasesof optimizing the 3D models/2D floorplans of multiple photo capturepoints, in order to obtain a 3D model with higher precision, the presentdisclosure allows a user to manually edit the photo capture result.Manual editing can be performed by using software-based review andediting tools.

Automatic 3D Modeling Apparatus

FIG. 5 is a schematic structural diagram illustrating an implementationof an automatic 3D modeling apparatus according to the presentdisclosure. As shown in FIG. 5, the automatic 3D modeling apparatusincludes the following:

a 3D model generation unit 501, configured to: based on an image usedfor 3D modeling of each of multiple spaces included in a modelingobject, generate a 3D model of each space; and

a 3D model assembling unit 502, configured to: based on position andcapture direction information when the image used for 3D modeling ofeach of the multiple spaces is captured, assemble the 3D models of thespaces generated by the 3D model generation unit 501 in the globalthree-dimensional coordinate system, to generate an overall 3D modelfrom the individual 3D models of the spaces.

Herein, the 3D model assembling unit 502 can further convert localcoordinates of the 3D model of a single space into global coordinates,for example, by using a transformation matrix based on the position andcapture direction information, so as to obtain the overall 3D model ofall spaces.

FIG. 6 is a schematic structural diagram illustrating anotherimplementation of an automatic 3D modeling apparatus according to thepresent disclosure. In the present implementation, for example, anautomatic 3D modeling space is a room, and an image used for 3D modelingis an indoor image of the room.

As shown in FIG. 6, the present implementation includes a 3D modelgeneration unit 601, configured to: based on an image used for 3Dmodeling of each of multiple rooms included in a modeling object,generate a 3D model of each room.

Herein, the 3D model generation unit 601 identifies one or more imageareas of at least one of a floor, a ceiling, and a wall in the imageused for 3D modeling based on a deep learning method; divides theidentified image area(s) into blocks based on an image processingtechnology, where each block is approximately considered as one plane,image blocks of the floor and the ceiling are located on a horizontalplane, and an image block of the wall is located on a vertical plane;and generates the 3D model by solving an equation for each plane, wherefor two planes that intersect in the image used for 3D modeling, anerror between a calculated intersecting line and an actually observedintersecting line is minimized.

Herein, the 3D model generation unit 601 further uses a computer visionalgorithm to identify wall corners in the indoor image and connect thewall corners to generate a rough model of the room. Herein, in one ormore implementations, for example, the method for identifying wallcorners in the image may be using the training framework ofself-supervised interest point detection and description, for example,using an approach as described in SuperPoint: Self-Supervised InterestPoint Detection and Description or other suitable approaches, and thenconnecting the wall corners to generate a rough model of the room, so asto capture a geometric relationship between objects such as wall cornersthat frequently appear in the same three-dimensional space structure.

A 3D model assembling unit 602 is configured to: based on position andcapture direction information when the image used for 3D modeling ofeach of the multiple rooms is captured, assemble the individual 3Dmodels of the rooms generated by the 3D model generation unit 601 in theglobal three-dimensional coordinate system, to generate an overall 3Dmodel from the individual 3D models of the rooms.

Herein, the 3D model assembling unit 602 can further convert localcoordinates of the 3D model of a single room into global coordinates,for example, by using a transformation matrix based on the position andcapture direction information, so as to obtain the overall 3D model ofall rooms.

Herein, the 3D model assembling unit 602 can further correct 3D modelsof the multiple rooms, including correcting wall line directions of allrooms by using a statistical method, so that wall lines of all rooms arealigned in the same direction if they were parallel within a specificerror range.

Herein, when assembling the 3D models of the rooms, the 3D modelassembling unit 602 can further correct one or more overlapping partsand/or gaps.

A 2D floorplan generation unit 603 is configured to generate a 2Dfloorplan in the following ways:

1. projecting each surface of the generated 3D model onto a planeparallel to the floor, and merging these projections into a polygon;

2. correcting and simplifying the obtained polygon, which may include,for example, the following methods:

(1) retaining only main vertices of the polygon and deleting smallconcave or convex rectangles; and

(2) using a computer vision algorithm to detect straight lines in thepicture, and then determining the direction of a wall, and aligningedges that are approximately parallel or perpendicular to the directionof the wall to corresponding directions;

certainly, the obtained polygon can be corrected and simplified in otherways, which is not limited to the above approaches;

3. assembling the generated 2D floorplans of the rooms in the sametwo-dimensional coordinate system based on the position and capturedirection information, to generate an overall 2D floorplan from theindividual 2D floorplans of the rooms; and

4. identifying and marking a position of a door and/or a window,including identifying the position of the door and/or the window on theindoor image by using a deep learning method, or determining theposition of the door by finding where a room outline is crossed by thetrack of the tracking map from capturing the first images of multiplerooms of the same property.

Herein, in one or more implementations, for example, the method foridentifying the position and the size of the door and/or the window onthe indoor image by using the deep learning method may be YOLO (You OnlyLook Once: Unified, Real-Time Object Detection).

Herein, the 2D floorplan generation unit 603 can further correct 2Dfloorplans of the multiple rooms, including correcting wall linedirections of all the rooms by using a statistical method, so that walllines of all the rooms are aligned in the same direction if they wereparallel within a specific error range.

Herein, when assembling the 2D floorplans of the rooms, the 2D floorplangeneration unit 603 can further correct one or more overlapping partsand/or gaps.

Herein, the 2D floorplan generation unit 603 can further generate a 2Dfloorplan in the following ways:

1. projecting each surface of the overall 3D model generated by the 3Dmodel assembling unit 602 onto a plane parallel to the floor, andmerging these projections into one or more polygons;

2. correcting and simplifying the obtained polygon(s), which mayinclude, for example, the following methods:

(1) retaining only main vertices of the polygon and deleting smallconcave or convex rectangles; and

(2) using a computer vision algorithm to detect straight lines in thepicture, and then determining the direction of a wall, and aligningedges that are approximately parallel or perpendicular to the directionof the wall to corresponding directions;

certainly, the obtained polygon can be corrected and simplified in otherways, which is not limited to the above approaches; and

3. identifying and marking a position of a door and/or a window,including identifying the position of the door and/or the window on theindoor image by using a deep learning method, or determining theposition of the door by finding where a room outline is crossed by thetrack of the tracking map from capturing the first images of multiplerooms of the same property. For example, the specific method is usingthe above YOLO model. Details are omitted herein for simplicity.

Automatic 3D Modeling Method

FIG. 7 is a schematic flowchart illustrating an implementation of anautomatic 3D modeling method according to the present disclosure. Asshown in FIG. 7, the automatic 3D modeling method includes thefollowing:

3D model generation step S71: based on an image used for 3D modeling ofeach of multiple spaces included in a modeling object, generate a 3Dmodel of each space.

In the present implementation, for example, an automatic 3D modelingspace is a room, and an image used for 3D modeling is an indoor image ofthe room.

In the 3D model generation step S71, one or more image areas of at leastone of a floor, a ceiling, and a wall in the image used for 3D modelingare identified based on a deep learning method; the identified imagearea(s) is divided into blocks based on an image processing technology,where each block is approximately considered as one plane, image blocksof the floor and the ceiling are located on a horizontal plane, and animage block of the wall is located on a vertical plane; and the 3D modelis generated by solving an equation for each plane, where for two planesthat intersect in the image used for 3D modeling, an error between acalculated intersecting line and an actually observed intersecting lineis minimized.

In the 3D model generation step S71, a computer vision algorithm isfurther used to identify wall corners in the indoor image and the wallcorners are connected to generate a rough model of the room. Herein, inone or more implementations, for example, the method for identifyingwall corners in the image may be using the training framework ofself-supervised interest point detection and description, for example,using an approach as described in SuperPoint: Self-Supervised InterestPoint Detection and Description or other suitable approaches, and thenconnecting the wall corners to generate a rough model of the room, so asto capture a geometric relationship between objects such as wall cornersthat frequently appear in the same three-dimensional space structure.

3D model assembling step S72: based on position and capture directioninformation when the image used for 3D modeling of each of the multiplerooms is captured, assemble the 3D models of the rooms generated in the3D model generation step S71 in the global three-dimensional coordinatesystem, to generate an overall 3D model from the individual 3D models ofthe rooms.

Herein, in the 3D model assembling step S72, local coordinates of the 3Dmodel of a single space can be further converted into globalcoordinates, for example, by using a transformation matrix based on theposition and capture direction information, so as to obtain the overall3D model of all spaces.

2D floorplan generation step S73: generate a 2D floorplan in thefollowing ways:

1. projecting each surface of the generated 3D model onto a planeparallel to the floor, and merging these projections into a polygon;

2. correcting and simplifying the obtained polygon, which may include,for example, the following methods:

(1) retaining only main vertices of the polygon and deleting smallconcave or convex rectangles; and

(2) using a computer vision algorithm to detect straight lines in thepicture, and then determining the direction of a wall, and aligningedges that are approximately parallel or perpendicular to the directionof the wall to corresponding directions;

herein, the obtained polygon can be corrected and simplified in otherways, which is not limited to the above approaches; and

3. assembling the generated 2D floorplans of the rooms in the sametwo-dimensional coordinate system based on the position and capturedirection information, to generate an overall 2D floorplan from theindividual 2D floorplans of the rooms; and

4. identifying and marking a position of a door and/or a window,including identifying the position of the door and/or the window on theindoor image by using a deep learning method, or determining theposition of the door by finding where a room outline is crossed by thetrack of the tracking map from capturing the first images of multiplerooms of the same property. For example, the specific method is usingthe above YOLO model. Details are omitted herein for simplicity.

Herein, in the 2D floorplan generation step S73, 2D floorplans of themultiple rooms can be further corrected, including correcting wall linedirections of all the rooms by using a statistical method, so that walllines of all the rooms are aligned in the same direction if they wereparallel within a specific error range.

Herein, in the 2D floorplan generation step S73, when the 2D floorplansof the rooms are assembled, one or more overlapping parts and/or gapscan be further corrected.

Herein, in the 2D floorplan generation step S73, a 2D floorplan can befurther generated in the following ways:

1. projecting each surface of the overall 3D model generated in the 3Dmodel assembling step S72 onto a plane parallel to the floor, andmerging these projections into one or more polygons;

2. correcting and simplifying the obtained polygon(s), which mayinclude, for example, the following methods:

(1) retaining only main vertices of the polygon and deleting smallconcave or convex rectangles; and

(2) using a computer vision algorithm to detect straight lines in thepicture, and then determining the direction of a wall, and aligningedges that are approximately parallel or perpendicular to the directionof the wall to corresponding directions;

certainly, the obtained polygon can be corrected and simplified in otherways, which is not limited to the above approaches;

3. identifying and marking a position of a door and/or a window,including identifying the position of the door and/or the window on theindoor image by using a deep learning method, or determining theposition of the door by finding where a room outline is crossed by thetrack of the tracking map from capturing the first images of multiplerooms of the same property. For example, the specific method is usingthe above YOLO model. Details are omitted herein for simplicity.

Electronic Device

FIG. 8 is a schematic structural diagram illustrating an electronicdevice (for example, the mobile device or the server in FIG. 1) 800 thatis suitable for implementing an implementation of the presentdisclosure. The electronic device in the implementation of the presentdisclosure may be any mobile device in the above system, and bepreferably a mobile device with a photo capture function. The electronicdevice is attached to a stand (such as a tripod) independently orjointly with another electronic terminal device such as a camera, tocooperate with application software running in various mobile operatingsystems to implement the implementation method in the presentdisclosure. The electronic device shown in FIG. 8 is merely an example,and shall not impose any limitation on a function and an applicationscope of the implementations of the present disclosure.

As shown in FIG. 8, the electronic device 800 can include a processingapparatus (such as a central processing unit and a graphics processingunit) 801 for controlling an overall operation of the electronic device.The processing apparatus can include one or more processors forexecuting instructions to perform all or some of the steps of the abovemethod. In addition, the processing apparatus 801 can include one ormore modules to process interaction with other apparatuses or units.

A storage apparatus 802 is configured to store various types of data.The storage apparatus 802 can include various types of computer-readablestorage media or a combination thereof. For example, the storageapparatus 802 can be an electrical, magnetic, optical, electromagnetic,infrared, or semiconductor system, apparatus, or device, or anycombination thereof. More specific examples of the computer-readablestorage media can include but are not limited to an electricalconnection with one or more conducting wires, a portable computer disk,a hard disk, a random access memory (RAM), a read-only memory (ROM), anerasable programmable read-only memory (EPROM or flash memory), anoptical fiber, a portable compact disk read-only memory (CD-ROM), anoptical storage device, a magnetic storage device, or any suitablecombination of the above. In the present disclosure, thecomputer-readable storage medium may be any tangible medium containingor storing a program. The program may be used by or in combination withan instruction execution system, apparatus, or device.

A sensor apparatus 803 is configured to perceive specified and measuredinformation and convert the information into a usable output signalaccording to a specific rule. One or more sensors can be included. Forexample, the sensor apparatus 803 can include an acceleration sensor, agyroscope sensor, a magnetic sensor, a pressure sensor, or a temperaturesensor, etc., which are used to detect changes in the on/off state,relative positioning, acceleration/deceleration, temperature, humidity,and light of the electronic device.

The processing apparatus 801, the storage apparatus 802, and the sensorapparatus 803 are connected to each other by using a bus 804. Aninput/output (I/O) interface 805 is also connected to the bus 804.

A multimedia apparatus 806 can include input devices such as atouchscreen, a touch pad, a keyboard, a mouse, a camera, and amicrophone to receive an input signal from a user. Various input devicescan cooperate with various sensors of the sensor apparatus 803 tocomplete a gesture operation input, an image recognition input, adistance detection input, etc. The multimedia apparatus 806 can furtherinclude output devices such as a liquid crystal display (LCD), aspeaker, and a vibrator.

A power supply apparatus 807 is configured to supply power to variousapparatuses in the electronic device, and can include a power managementsystem, one or more power supplies, and a component that distributespower to other devices.

A communications apparatus 808 may allow the electronic device 800 toperform wireless or wired communication with other devices to exchangedata.

The above various apparatuses can also be connected to the I/O interface805 to implement application of the electronic device 800.

Although FIG. 8 shows the electronic device 800 having variousapparatuses, it should be understood that not all shown apparatuses needto be implemented or included. More or fewer apparatuses can beimplemented or included alternatively.

In particular, according to the implementations of the presentdisclosure, the process described above with reference to the flowchartcan be implemented as a computer software program. For example, theimplementations of the present disclosure include a computer programproduct that includes a computer program that is carried on anon-transient computer readable medium. The computer program includesprogram code for performing the method shown in the flowchart. In suchan implementation, the computer program can be downloaded and installedfrom a network by using the communications apparatus, or installed fromthe storage apparatus. When the computer program is executed by theprocessing apparatus, the above functions defined in the method in theimplementations of the present disclosure are executed.

In the context of the present disclosure, a machine-readable medium canbe a tangible medium, which can contain or store a program for use by orin combination with an instruction execution system, apparatus, ordevice.

It should be noted that, the above computer-readable medium in thepresent disclosure can be a computer-readable signal medium or acomputer-readable storage medium, or any combination thereof. In thepresent disclosure, the computer-readable signal medium can include adata signal that is propagated in a baseband or as a part of a carrier,and carries computer-readable program code. Such propagated data signalmay take a plurality of forms, including but not limited to anelectromagnetic signal, an optical signal, or any suitable combinationof the above. The computer-readable signal medium may also be anycomputer-readable medium other than the computer-readable storagemedium. The computer-readable signal medium may send, propagate, ortransmit a program for use by or in combination with an instructionexecution system, apparatus, or device. The program code included in thecomputer-readable medium can be transmitted in any suitable medium,including but not limited to a cable, an optical cable, radio frequency(RF), or the like, or any suitable combination of the above.

The above computer-readable medium may be included in the aboveelectronic device, or may exist alone without being assembled into theelectronic device.

Computer program code for performing an operation of the presentdisclosure can be written in one or more program design languages or acombination thereof. The above program design languages include but arenot limited to object-oriented program design languages such as Java,Smalltalk, and C++, and conventional procedural program design languagessuch as C or a similar program design language. The program code can beexecuted entirely on a user computer, partly on a user computer, as aseparate software package, partly on a user computer and partly on aremote computer, or entirely on a remote computer or server. In a caseinvolving a remote computer, the remote computer can be connected to auser computer through any type of network. Alternatively, the remotecomputer can be connected to an external computer (for example, by usingan Internet service provider for connection over the Internet).

The flowcharts and block diagrams in the accompanying drawings show thearchitectures, functions, and operations that may be implementedaccording to the systems, methods, and computer program products invarious implementations of the present disclosure. In this regard, eachblock in the flowchart or block diagram may represent one module, oneprogram segment, or one part of code. The module, the program segment,or the part of code includes one or more executable instructions forimplementing specified logical functions. It should also be noted that,in some alternative implementations, the functions marked in the blocksmay occur in an order different from that marked in the figures. Forexample, two consecutive blocks can actually be executed in parallel,and sometimes they can also be executed in reverse order, depending onthe function involved. It should also be noted that, each block in theblock diagram and/or flowchart, and a combination of blocks in the blockdiagram and/or flowchart can be implemented by using a dedicatedhardware-based system that performs a specified function or operation,or can be implemented by using a combination of dedicated hardware andcomputer instructions.

The units described in the implementations of the present disclosure canbe implemented by software or hardware. In some cases, a name of a unitdoes not constitute a restriction on the unit.

The functions described above in the specification can be performed atleast in part by one or more hardware logic components. For example,without limitation, exemplary types of hardware logic components thatcan be used include a field programmable gate array (FPGA), anapplication-specific integrated circuit (ASIC), an application-specificstandard product (ASSP), a system on chip (SoC), a complex programmablelogic device (CPLD), etc.

The above descriptions are only the preferred implementations of thepresent disclosure and the explanation of the applied technicalprinciples. A person skilled in the art should understand that, thedisclosure scope of the present disclosure is not limited to thetechnical solutions formed by the specific combination of the abovetechnical features, but should also cover other technical solutionsformed by any combination of the above technical features or theirequivalent features without departing from the above disclosed concepts,for example, a technical solution formed by interchanging the abovefeatures and the technical features that are disclosed (but not limitedthereto) in the present disclosure having similar functions.

In addition, although the operations are depicted in a specific order,it should not be construed that these operations need to be performed inthe specific order shown or sequentially. In a specific environment,multi-tasking and concurrent processing may be advantageous. Likewise,although some specific implementation details are included in the abovediscussion, these details should not be construed as a limitation on thescope of the present disclosure. Some features that are described in thecontext of separate implementations can also be implemented incombination in a single implementation. Conversely, various featuresthat are described in the context of a single implementation can also beimplemented in multiple implementations separately or in any suitablesub-combination.

Although the subject matter has been described in languages specific tostructural features and/or methodological logical actions, it should beunderstood that the subject matter defined in the appended claims is notnecessarily limited to the specific features or actions described above.On the contrary, the specific features and actions described above aremerely exemplary forms of the implementations.

The implementation can be further appreciated through the belowembodiments.

In an embodiments, an automatic 3D modeling apparatus includes a 3Dmodel generation unit, configured to: based on a first image of each ofmultiple spaces included in a modeling object, generate a 3D model ofeach space of the multiple spaces; and a 3D model assembling unit,configured to: based on position and capture direction information ofthe first image of each of the multiple spaces being captured, assemble3D models of the multiple spaces generated by the 3D model generationunit in a global three-dimensional coordinate system, to generate anoverall 3D model from the individual 3D models of the spaces.

The foregoing and other described embodiments can each, optionally,include one or more of the following features:

A first feature, combinable with any of the other features, specifiesthat the 3D model assembling unit converts local coordinates of the 3Dmodel of a single space into global coordinates based on the positionand capture direction information, so as to obtain the overall 3D modelof all the spaces.

A second feature, combinable with any of the other features, specifiesthat the space is a room; the first image is an indoor image of theroom; the 3D model generation unit identifies one or more image areas ofat least one of a floor, a ceiling, and a wall in the first image basedon a deep learning method; divides the identified image area(s) intoblocks based on an image processing technology, wherein each block isapproximately considered as one plane, image blocks of the floor and theceiling are located on a horizontal plane, and an image block of thewall is located on a vertical plane; and generates the 3D model bysolving an equation for each plane, wherein for two planes thatintersect in the first image, an error between a calculated intersectingline and an actually observed intersecting line is minimized; and the 3Dmodel generation unit further uses a computer vision algorithm toidentify wall corners in the indoor image and connect the wall cornersto generate a rough model of the room.

A third feature, combinable with any of the other features, specifiesthat the 3D model assembling unit corrects 3D models of the multiplerooms, including correcting wall line directions of all rooms by using astatistical method, so that wall lines of all rooms are aligned in thesame direction if they were parallel within a specific error range; andwhen assembling the 3D models of the rooms, the 3D model assembling unitcorrects one or more overlapping parts and/or gaps.

A fourth feature, combinable with any of the other features, specifiesthat the automatic 3D modeling apparatus further comprises: a 2Dfloorplan generation unit, configured to generate a 2D floorplan in thefollowing ways: projecting each surface of the generated 3D model onto aplane parallel to the floor, and merging these projections into apolygon; correcting and simplifying the obtained polygon, including atleast one of the following: (1) retaining only main vertices of thepolygon and deleting small concave or convex rectangles; and (2) using acomputer vision algorithm to detect straight lines in the picture, andthen determining the direction of a wall, and aligning edges that areapproximately parallel or perpendicular to the direction of the wall tocorresponding directions; assembling the generated 2D floorplans of therooms in the same two-dimensional coordinate system based on theposition and capture direction information, to generate an overall 2Dfloorplan from the individual 2D floorplans of the rooms; andidentifying and marking a position of a door and/or a window, includingidentifying the position of the door and/or the window on the indoorimage by using a deep learning method, or determining the position ofthe door by finding where a room outline is crossed by the track of thetracking map from capturing the first images of multiple rooms of thesame property.

A fifth feature, combinable with any of the other features, specifiesthat the 2D floorplan generation unit corrects 2D floorplans of themultiple rooms, including correcting wall line directions of all therooms by using a statistical method, so that wall lines of all the roomsare aligned in the same direction if they were parallel within aspecific error range; and when assembling the 2D floorplans of therooms, the 2D floorplan generation unit corrects one or more ofoverlapping parts and gaps.

A sixth feature, combinable with any of the other features, specifiesthat the automatic 3D modeling apparatus further comprises a 2Dfloorplan generation unit, configured to generate a 2D floorplan in thefollowing ways: projecting each surface of the overall 3D modelgenerated by the 3D model assembling unit onto a plane parallel to thefloor, and merging these projections into one or more polygons;correcting and simplifying the obtained polygon(s), including at leastone of the following: (1) retaining only main vertices of the polygonand deleting small concave or convex rectangles; and (2) using acomputer vision algorithm to detect straight lines in the picture, andthen determining the direction of a wall, and aligning edges that areapproximately parallel or perpendicular to the direction of the wall tocorresponding directions; and identifying and marking a position of adoor and/or a window, including identifying the position of the doorand/or the window on the indoor image by using a deep learning method,or determining the position of the door by finding where a room outlineis crossed by the track of the tracking map from capturing the firstimages of multiple rooms of the same property.

In another embodiment, a photography-based 3D modeling method comprisesthe following steps: attaching a mobile device with a photo capturefunction and a camera to a same camera stand; capturing a plurality offirst images at a plurality of photo capture points using one or more ofthe mobile device and the camera; obtaining multiple second images usingthe camera or the mobile device during movement of the stand among theplurality of photo capture points; obtaining a position and a capturedirection of each photo capture point by optionally using one or moresensors of one or more of the camera and the mobile device; building atracking map that uses a global coordinate system based on the positionof each photo capture point; generating 3D models on the mobile deviceor a remote server based on one or more first images captured at eachphoto capture point; placing the individual 3D models of all photocapture points in the global three-dimensional coordinate system basedon the position and the capture direction of each photo capture point;and connecting the individual 3D models of multiple photo capture pointsto generate an overall 3D model that includes multiple photo capturepoints.

The foregoing and other described embodiments can each, optionally,include one or more of the following features:

In a seventh feature, combinable with any of the other features,specifies that the steps use a positioning system of the mobile deviceor the camera and performs feature point matching based on second imagescaptured by the mobile device or the camera at adjacent photo capturepoints, to identify relative displacement and capture directioninformation of the photo capture points, in order to build a trackingmap that includes all photo capture points in the global coordinatesystem and provide a position and a direction of each photo capturepoint.

In an eighth feature, combinable with any of the other features,specifies that the method further comprises correcting the tracking mapfrom obtaining information that includes acceleration, velocity, anddirection of movement by using one or more sensors of the mobile deviceor the camera.

In a ninth feature, combinable with any of the other features, specifiesthat the method further comprises obtaining an angle between a capturedirection of a lens of the camera and a capture direction of the mobiledevice, wherein at an initialization stage, the positioning system basedon the mobile device and the positioning system based on the camera runsimultaneously, and the stand is moved by a specific distance; in suchcase, the two systems each provide one displacement vector, and an anglebetween the two vectors is the angle between the capture direction ofthe lens of the camera and the capture direction of the mobile device;an angle consistent with the capture direction of the mobile device isspecified by manually rotating a preview image or a captured image ofthe camera; preview images or captured images of the mobile device andthe camera are matched by using an image recognition algorithm, toidentify the angle; or an additional mark is used including adding amark to the stand to form a fixed angle with a mounting direction of themobile device, and then the mark is identified in the preview image orthe image of the camera, so as to calculate the angle between thecapture direction of the lens of the camera and the capture direction ofthe mobile device.

In a tenth feature, combinable with any of the other features, specifiesthat the generating the 3D models includes: identifying one or moreimage areas of at least one of a floor, a ceiling, and a wall in theimage based on a deep learning method; and dividing the identified oneor more image areas into blocks based on an image processing technology,wherein each block is approximately considered as one plane, imageblocks of the floor and the ceiling are located on a horizontal plane,and an image block of the wall is located on a vertical plane; andgenerating the 3D model by solving an equation for each plane, whereinfor two planes that intersect in the image, an intersecting line of thetwo planes is used as a constraint, so that an error between acalculated intersecting line and an actually observed intersecting lineis minimized.

In an eleventh feature, combinable with any of the other features,specifies that the generating the 3D models further includes: using acomputer vision algorithm to identify wall corners in an indoor image,and connecting the wall corners to generate a rough model of a room.

In a twelfth feature, combinable with any of the other features,specifies that the method further comprises: converting localcoordinates of a 3D model of a single photo capture point into globalcoordinates based on a position and a capture direction of each photocapture point, so as to obtain an overall 3D model of all photo capturepoints; performing a correction on the individual 3D models of multiplephoto capture points, including correcting wall line directions of allphoto capture points by using a statistical method, so that wall linesof all rooms are aligned in the same direction if they were parallelwithin a specific error range; and when assembling the 3D models of thephoto capture points, correcting one or more overlapping parts and gaps.

The various embodiments described above can be combined to providefurther embodiments. Aspects of the embodiments can be modified, ifnecessary to employ concepts of the various patents, applications andpublications to provide yet further embodiments.

These and other changes can be made to the embodiments in light of theabove-detailed description. In general, in the following claims, theterms used should not be construed to limit the claims to the specificembodiments disclosed in the specification and the claims, but should beconstrued to include all possible embodiments along with the full scopeof equivalents to which such claims are entitled. Accordingly, theclaims are not limited by the disclosure.

What is claimed is:
 1. A system, comprising: a photo capture unit,configured to capture a first image of each of multiple spaces; a 3Dmodel generation unit, configured to generate a 3D model of each spacebased on the first image that is captured by the photo capture unit foreach of the multiple spaces; a capture position acquisition unit,configured to obtain position and capture direction information of thephoto capture unit in capturing the first image; and a 3D modelassembling unit, configured to, based on the position and capturedirection information, combine the 3D models of the multiple spaces in aglobal three-dimensional coordinate system to generate an overall 3Dmodel that includes the multiple spaces.
 2. The system according toclaim 1, wherein the photo capture unit captures multiple second imagesduring a process that the photo capture unit moves among the spaces; andthe capture position acquisition unit performs feature point matchingbased on the multiple second images to obtain one or more of relativedisplacement and capture direction information of each photo capturepoint.
 3. The system according to claim 1, wherein the photo captureunit has one or more of positioning-aware sensors and direction-awaresensors; and the capture position acquisition unit obtains, based onpositioning information and/or direction information provided by thephoto capture unit in capturing a first image of a space in which thephoto capture unit is located, position and/or capture directioninformation of the photo capture unit in capturing the first image ofthe space in which the photo capture unit is located.
 4. The systemaccording to claim 1, wherein the photo capture unit captures multiplesecond images during a process that the photo capture unit moves amongthe spaces; the photo capture unit has a positioning sensor and/or adirection sensor; and the capture position acquisition unit performsfeature point matching based on the multiple second images to obtainrelative displacement and capture direction information of each photocapture point, and corrects one or more of the relative displacement andthe capture direction information based on respective positioninginformation or direction information provided by the photo capture unitin capturing a first image of a space in which the photo capture unit islocated.
 5. The system according to claim 4, wherein the photo captureunit includes a displacement sensor and the capture position acquisitionunit corrects the relative displacement and/or capture directioninformation based on displacement information that is obtained by thedisplacement sensor.
 6. The system according to claim 1, wherein the 3Dmodel assembling unit converts local coordinates of a 3D model of asingle space into global coordinates based on the position and capturedirection information obtained by the capture position acquisition unit.7. The system according to claim 6, wherein the converting the localcoordinates of the 3D model of the single space into the globalcoordinates comprises: enabling the photo capture unit to move apredetermined distance; obtaining, by the capture position acquisitionunit, positions of two endpoints of the predetermined distance; andobtaining a ratio of the local coordinates to the world coordinatesbased on a ratio of a distance between the positions of the twoendpoints to the predetermined distance.
 8. The system according toclaim 6, wherein the converting the local coordinates of the 3D model ofthe single space into the global coordinates comprises: identifying, bythe capture position acquisition unit, one or more feature points on thefirst image; estimating, based on the identified feature point, avertical distance between a plane on which a floor surface or a ceilingsurface of the space is located and the photo capture unit; andcalculating a ratio of the vertical distance to a height of the photocapture unit to obtain the scale of the local coordinates to the globalcoordinates.
 9. The system according to claim 8, wherein beforeperforming photo capture at a first photo capture point or duringmovement of the photo capture unit subsequent to the first capturepoint, the photo capture unit moves a predetermined distance to obtain apredetermined quantity of the feature points.
 10. The system accordingto claim 1, wherein the photo capture unit has binocular lenses, and thebinocular lenses separately capture first images at a same photo capturepoint; and the 3D model generation unit compares the first images thatare captured by the binocular lenses, determines corresponding pixelsbetween the first images captured by the binocular lenses, and obtainsdepth information of each corresponding pixel.
 11. The system accordingto claim 1, wherein the 3D model generation unit predicts the depth ofeach pixel in the first image by using a deep learning method, andcalculates a normal direction of each pixel or predicts the normaldirection of each pixel by using the deep learning method.
 12. Thesystem according to claim 1, wherein the photo capture unit isimplemented by one or more of a camera and a mobile device with a photocapture function; the 3D model generation unit is implemented by one ormore of the mobile device or by a remote server; the capture positionacquisition unit is implemented by the camera or the mobile device; andthe 3D model assembling unit is implemented by one or more of the mobiledevice and a remote server.
 13. The system according to claim 12,wherein the camera and the mobile device with a photo capture functionfor implementing the photo capture unit are attached to a same camerastand; during movement of the stand, multiple second images are capturedby the camera or the mobile device; and one or more of the position andcapture direction information of the camera or the mobile device incapturing the first image of a space in which the camera or the mobiledevice is located is determined at least in part based on the multiplesecond images.
 14. The system according to claim 13, wherein based on apositioning system of the camera or the mobile device, feature pointmatching is performed on second images at adjacent photo capture pointsto obtain relative displacement and/or capture direction information ofeach photo capture point, thereby providing a relative position and/ordirection of each photo capture point.
 15. The system according to claim13, wherein the photo capture unit obtains an angle between a capturedirection of a lens of the camera and a capture direction of the mobiledevice by using one or more of the following methods: simultaneouslyrunning a positioning system based on the mobile device and apositioning system based on the camera, moving the stand by a specificdistance to obtain two displacement vectors by the position systems, anddetermining an angle between the two displacement vectors as the anglebetween the capture direction of the lens of the camera and the capturedirection of the mobile device; specifying an angle consistent with thecapture direction of the mobile device by manually rotating a previewimage or a captured image of the camera; matching preview images orcaptured images of the mobile device and the camera by using an imagerecognition algorithm, to identify the angle; using an additional mark(including adding a mark to the stand to form a fixed angle with amounting direction of the mobile device), and then identifying the markin a preview image or an image of the camera, so as to calculate theangle between the capture direction of the lens of the camera and thecapture direction of the mobile device; and using a camera installationinterface on the stand so that a known fixed angle is formed between thecamera and the mobile device.
 16. The system according to claim 1,wherein the space is a room; the first image is an indoor image of theroom; and the 3D model generation unit identifies one or more imageareas of at least one of a floor, a ceiling, and a wall of the room inthe first image based on a deep learning method; divides the identifiedimage areas into blocks based on an image processing technology, whereineach block is approximately considered as one plane, image blocks of thefloor and the ceiling are located on a horizontal plane, and an imageblock of the wall is located on a vertical plane; and generates the 3Dmodel by solving an equation for each plane, wherein for two planes thatintersect in the first image, an error between a calculated intersectingline and an actually observed intersecting line is minimized.
 17. Thesystem according to claim 1, wherein the 3D model generation unitfurther uses a computer vision algorithm to identify wall corners in theindoor image and connect the wall corners to generate a rough model ofthe room.
 18. The system according to claim 17, wherein the 3D modelassembling unit corrects 3D models of the multiple rooms, includingcorrecting wall line directions of all rooms by using a statisticalmethod, so that wall lines of all rooms are substantially aligned in thesame direction if they were parallel within a specific error range; andin assembling the 3D models of the rooms, the 3D model assembling unitcorrects one or more overlapping parts and/or gaps.
 19. The systemaccording to claim 17, further comprising: a 2D floorplan generationunit, configured to generate a 2D floorplan including: projecting eachsurface of the generated 3D model onto a plane parallel to the floor,and merging these projections into a polygon; correcting and simplifyingthe obtained polygon, including at least one of the following:identifying only main vertices of the polygon and deleting small concaveor convex rectangles; and using a computer vision algorithm to detectstraight lines in the picture, and then determining the direction of awall, and aligning edges that are approximately parallel orperpendicular to the direction of the wall to corresponding directions;assembling the generated 2D floorplans of the rooms in the sametwo-dimensional coordinate system based on the position and capturedirection information of each space obtained by the capture positionacquisition unit, to generate an overall 2D floorplan from theindividual 2D floorplans of the rooms; and identifying and marking aposition of at least one of a door and a window, including identifyingthe position of the at least one of the door and the window on theindoor image by using a deep learning method, or determining theposition of the door by finding where a room outline is crossed by thetrack of the tracking map from capturing the first images of multiplerooms of a same property.
 20. The system according to claim 19, whereinthe 2D floorplan generation unit corrects 2D floorplans of the multiplerooms, including correcting wall line directions of all the rooms byusing a statistical method, so that wall lines of all the rooms arealigned in a same direction if they were parallel within a specificerror range; and in assembling the 2D floorplans of the rooms, the 2Dfloorplan generation unit corrects one or more overlapping parts orgaps.
 21. The photography-based 3D modeling system according to claim17, further comprising: a 2D floorplan generation unit, configured togenerate a 2D floorplan in the following ways: projecting each surfaceof the overall 3D model generated by the 3D model assembling unit onto aplane parallel to the floor, and merging these projections into one ormore polygons; and correcting and simplifying the obtained one or morepolygon, including at least one of the following: retaining only mainvertices of the polygon and deleting small concave or convex rectangles;and using a computer vision algorithm to detect straight lines in thepicture, and then determining the direction of a wall, and aligningedges that are approximately parallel or perpendicular to the directionof the wall to corresponding directions; and identifying and marking aposition of one or more of a door or a window, including identifying theposition of the one or more of the door or the window on the indoorimage by using a deep learning method, or determining the position ofthe door by finding where a room outline is crossed by the track of thetracking map from capturing the first images of multiple rooms of thesame property.
 22. An automatic 3D modeling method, comprising: a 3Dmodel generation act that generates a 3D model of each space based on afirst image of each of multiple spaces included in a modeling object;and a 3D model assembling act that assembly, based on position andcapture direction information of the first image of each of the multiplespaces being captured, the 3D models of the multiple spaces generated inthe 3D model generation act in a global three-dimensional coordinatesystem, to generate an overall 3D model from the 3D models of thespaces.
 23. The automatic 3D modeling method according to claim 22,wherein in the 3D model assembling, local coordinates of the 3D model ofa single space are converted into global coordinates based on theposition and capture direction information, so as to obtain an overall3D model of all the spaces.
 24. The automatic 3D modeling methodaccording to claim 22, wherein the space is a room; the first image isan indoor image of the room; the 3D model generation includes: one ormore image areas of at least one of a floor, a ceiling, and a wall inthe first image are identified based on a deep learning method; theidentified image area is divided into blocks based on an imageprocessing technology, wherein each block is approximately considered asone plane, image blocks of the floor and the ceiling are located on ahorizontal plane, and an image block of the wall is located on avertical plane; and the 3D model is generated by solving an equation foreach plane, wherein for two planes that intersect in the first image, anerror between a calculated intersecting line and an actually observedintersecting line is minimized.
 25. The automatic 3D modeling methodaccording to claim 24, further comprising: generating a 2D floorplan inthe following ways: projecting each surface of the generated 3D modelonto a plane parallel to the floor, and merging these projections into apolygon; correcting and simplifying the obtained polygon, including atleast one of the following methods: retaining only main vertices of thepolygon and deleting small concave or convex rectangles; and using acomputer vision algorithm to detect straight lines in the picture, andthen determining the direction of a wall, and aligning edges that areapproximately parallel or perpendicular to the direction of the wall tocorresponding directions; assembling the generated 2D floorplans of therooms in a same two-dimensional coordinate system based on the positionand capture direction information, to generate an overall 2D floorplanfrom the individual 2D floorplans of the rooms; and identifying andmarking a position of a door and/or a window, including identifying theposition of the door and/or the window on the indoor image by using adeep learning method, or determining the position of the door by findingwhere a room outline is crossed by the track of the tracking map fromcapturing the first images of multiple rooms of the same property. 26.The automatic 3D modeling method according to claim 25, wherein in the2D floorplan generation, 2D floorplans of the multiple rooms arecorrected, including correcting wall line directions of all the rooms byusing a statistical method, so that wall lines of all the rooms aresubstantially aligned in the same direction if they were parallel withina specific error range; and the assembling the 2D floorplans of therooms includes correcting one or more of an overlapping part and a gapbetween 2D floorplan of two rooms.
 27. The automatic 3D modeling methodaccording to claim 24, further comprising: generating a 2D floorplanincluding: projecting each surface of the overall 3D model generated inthe 3D model assembling step onto a plane parallel to the floor, andmerging these projections into one or more polygons; correcting andsimplifying the obtained a polygon using at least one of the following:retaining only main vertices of the polygon and deleting small concaveor convex rectangles; and using a computer vision algorithm to detectstraight lines in the picture, and then determining the direction of awall, and aligning edges that are approximately parallel orperpendicular to the direction of the wall to corresponding directions;and identifying and marking a position of one or more of a door and awindow, including identifying the position of one or more of the doorand the window on the indoor image by using a deep learning method, ordetermining the position of the door by finding where a room outline iscrossed by the track of the tracking map from capturing the first imagesof multiple rooms of the same property.
 28. The automatic 3D modelingmethod according to claim 22, wherein the space is a room; the firstimage is an indoor image of the room; and the 3D model generationincludes using a computer vision algorithm to identify wall corners inthe indoor image and the wall corners are connected to generate a roughmodel of the room
 29. A photography-based 3D modeling method, comprisingthe following steps: attaching a mobile device with a photo capturefunction and a camera to a same camera stand; capturing a plurality offirst images at a plurality of photo capture points using one or more ofthe mobile device and the camera; obtaining multiple second images usingthe camera or the mobile device during movement of the stand among theplurality of photo capture points; obtaining a position and a capturedirection of each photo capture point by optionally using one or moresensors of one or more of the camera and the mobile device; building atracking map that uses a global coordinate system based on the positionof each photo capture point; generating 3D models on the mobile deviceor a remote server based on one or more first images captured at eachphoto capture point; and placing the individual 3D models of all photocapture points in the global three-dimensional coordinate system basedon the position and the capture direction of each photo capture point;and connecting the individual 3D models of multiple photo capture pointsto generate an overall 3D model that includes multiple photo capturepoints.